Market Basket Analysis Algorithm with Map/Reduce of Cloud Computing

نویسندگان

  • Jongwook Woo
  • Yuhang Xu
چکیده

Map/Reduce approach has been popular in order to compute huge volumes of data since Google implemented its platform on Google Distributed File Systems (GFS) and then Amazon Web Service (AWS) provides its services with Apache Hadoop platform. Map/Reduce motivates to redesign and convert the existing sequential algorithms to Map/Reduce algorithms for big data so that the paper presents Market Basket Analysis algorithm with Map/Reduce, one of popular data mining algorithms. The algorithm is to sort data set and to convert it to (key, value) pair to fit with Map/Reduce. It is executed on Amazon EC2 Map/Reduce platform. The experimental results show that the code with Map/Reduce increases the performance as adding more nodes but at a certain point, there is a bottle-neck that does not allow the performance gain. It is believed that the operations of distributing, aggregating, and reducing data in Map/Reduce should cause the bottle-neck.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Market Basket Analysis Algorithm on Map/Reduce in AWS EC2

As the web, social networking, and smartphone application have been popular, the data has grown drastically everyday. Thus, such data is called Big Data. Google met Big Data earlier than others and recognized the importance of the storage and computation of Big Data. Thus, Google implemented its parallel computing platform with Map/Reduce approach on Google Distributed File Systems (GFS) in ord...

متن کامل

Heterogeneous Multi core processors for improving the efficiency of Market basket analysis algorithm in data mining

-Heterogeneous multi core processors can offer diverse computing capabilities. The efficiency of Market Basket Analysis Algorithm can be improved with heterogeneous multi core processors. Market basket analysis algorithm utilises apriori algorithm and is one of the popular data mining algorithms which can utilise Map/Reduce framework to perform analysis. The algorithm generates association rule...

متن کامل

A Genetic Based Resource Management Algorithm Considering Energy Efficiency in Cloud Computing Systems

Cloud computing is a result of the continuing progress made in the areas of hardware, technologies related to the Internet, distributed computing and automated management. The Increasing demand has led to an increase in services resulting in the establishment of large-scale computing and data centers, in addition to high operating costs and huge amounts of electrical power consumption. Insuffic...

متن کامل

Task Scheduling Algorithm Using Covariance Matrix Adaptation Evolution Strategy (CMA-ES) in Cloud Computing

The cloud computing is considered as a computational model which provides the uses requests with resources upon any demand and needs.The need for planning the scheduling of the user's jobs has emerged as an important challenge in the field of cloud computing. It is mainly due to several reasons, including ever-increasing advancements of information technology and an increase of applications and...

متن کامل

Improving the palbimm scheduling algorithm for fault tolerance in cloud computing

Cloud computing is the latest technology that involves distributed computation over the Internet. It meets the needs of users through sharing resources and using virtual technology. The workflow user applications refer to a set of tasks to be processed within the cloud environment. Scheduling algorithms have a lot to do with the efficiency of cloud computing environments through selection of su...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2007